Search CORE

9 research outputs found

Scalable reconfigurable computing leveraging latency-insensitive channels

Author: Fleming Kermin Elliott, Jr
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 190-197).Traditionally, FPGAs have been confined to the limited role of small, low-volume ASIC replacements and as circuit emulators. However, continued Moore's law scaling has given FPGAs new life as accelerators for applications that map well to fine-grained parallel substrates. Examples of such applications include processor modelling, compression, and digital signal processing. Although FPGAs continue to increase in size, some interesting designs still fail to fit in to a single FPGA. Many tools exist that partition RTL descriptions across FPGAs. Unfortunately, existing tools have low performance due to the inefficiency of maintaining the cycle-by-cycle behavior of RTL among discrete FPGAs. These tools are unsuitable for use in FPGA program acceleration, as the purpose of an accelerator is to make applications run faster. This thesis presents latency-insensitive channels, a language-level mechanism by which programmers express points in their their design at which the cycle-by-cycle behavior of the design may be modified by the compiler. By decoupling the timing of portions of the RTL from the high-level function of the program, designs may be mapped to multiple FPGAs without suffering the performance degradation observed in existing tools. This thesis demonstrates, using a diverse set of large designs, that FPGA programs described in terms of latency-insensitive channels obtain significant gains in design feasibility, compilation time, and run-time when mapped to multiple FPGAs.by Kermin Elliott Fleming, Jr.Ph.D

DSpace@MIT

Wilis: Architectural Modeling of Wireless Systems

Author: Fleming Kermin Elliott
Gross Samuel
Mithal Arvind
Ng Man Cheuk
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2011
Field of study

The performance of a wireless system depends on the wireless channel as well as the algorithms used in the transceiver pipelines. Because physical phenomena affect transceiver pipelines in difficult to predict ways, detailed simulation of the entire transceiver system is needed to evaluate even a single processing block. Further, some protocol validations require simulation of rare events (say, 1 bit error in 109 bits), which means the protocol must simulate for a long enough time for such events to materialize. This requirement coupled with the heavy computation typical of most physical-layer processing, rules out pure software solutions. In this paper we describe WiLIS, an FPGA-based hybrid hardware-software system designed to facilitate the development of wireless protocols. We then use WiLIS to evaluate several microarchitectures for measuring very low bit-error rates (BER). We demonstrate, for the first time, that the recently proposed SoftPHY can be implemented efficiently in hardware

DSpace@MIT

A hardware spinal decoder

Author: Balakrishnan Hari
Fleming Kermin Elliott
Iannucci Peter A.
Perry Jonathan
Shah Devavrat
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Spinal codes are a recently proposed capacity-achieving rateless code. While hardware encoding of spinal codes is straightforward, the design of an efficient, high-speed hardware decoder poses significant challenges. We present the first such decoder. By relaxing data dependencies inherent in the classic M-algorithm decoder, we obtain area and throughput competitive with 3GPP turbo codes as well as greatly reduced latency and complexity. The enabling architectural feature is a novel alpha-beta incremental approximate selection algorithm. We also present a method for obtaining hints which anticipate successful or failed decoding, permitting early termination and/or feedback-driven adaptation of the decoding parameters. We have validated our implementation in FPGA with on-air testing. Provisional hardware synthesis suggests that a near-capacity implementation of spinal codes can achieve a throughput of 12.5 Mbps in a 65 nm technology while using substantially less area than competitive 3GPP turbo code implementations.Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipIntel Corporation (Fellowship)Claude E. Shannon Research Assistantshi

CiteSeerX

DSpace@MIT

Crossref

Leveraging latency-insensitivity to ease multiple FPGA design

Author: Arvind Arvind
Emer Joel S
Fleming Kermin Elliott
Pellauer Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/06/2019
Field of study

Traditionally, hardware designs partitioned across multiple FPGAs have had low performance due to the inefficiency of maintaining cycle-by-cycle timing among discrete FPGAs. In this paper, we present a mechanism by which complex designs may be efficiently and automatically partitioned among multiple FPGAs using explicitly programmed latency-insensitive links. We describe the automatic synthesis of an area efficient, high performance network for routing these inter-FPGA links. By mapping a diverse set of large research prototypes onto a multiple FPGA platform, we demonstrate that our tool obtains significant gains in design feasibility, compilation time, and even wall-clock performance.Intel Corporation. Graduate Fellowshi

DSpace@MIT

Scalable multi-access flash store for big data analytics

Author: Arvind Arvind
Fleming Kermin Elliott
Jun Sang-Woo
Liu Ming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/06/2019
Field of study

For many "Big Data" applications, the limiting factor in performance is often the transportation of large amount of data from hard disks to where it can be processed, i.e. DRAM. In this paper we examine an architecture for a scalable distributed flash store which aims to overcome this limitation in two ways. First, the architecture provides a highperformance, high-capacity, scalable random-access storage. It achieves high-throughput by sharing large numbers of flash chips across a low-latency, chip-to-chip backplane network managed by the flash controllers. The additional latency for remote data access via this network is negligible as compared to flash access time. Second, it permits some computation near the data via a FPGA-based programmable flash controller. The controller is located in the datapath between the storage and the host, and provides hardware acceleration for applications without any additional latency. We have constructed a small-scale prototype whose network bandwidth scales directly with the number of nodes, and where average latency for user software to access flash store is less than 70?s, including 3.5?s of network overhead.Quanta Computer Incorporated (#6922986)Samsung Electronics Co. (#6925093

DSpace@MIT

Airblue: A System for Cross-Layer Wireless Protocol Development

Author: Balakrishnan Hari
Fleming Kermin Elliott
Gross Samuel
Mithal Arvind
Ng Man Cheuk
Vutukuru Mythili
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Over the past few years, researchers have developed many crosslayer wireless protocols to improve the performance of wireless networks. Experimental evaluations of these protocols have been carried out mostly using software-defined radios, which are typically two to three orders of magnitude slower than commodity hardware. FPGA-based platforms provide much better speeds but are quite difficult to modify because of the way high-speed designs are typically implemented. Experimenting with cross-layer protocols requires a flexible way to convey information beyond the data itself from lower to higher layers, and a way for higher layers to configure lower layers dynamically and within some latency bounds. One also needs to be able to modify a layer's processing pipeline without triggering a cascade of changes. We have developed Airblue, an FPGA-based software radio platform, that has all these properties and runs at speeds comparable to commodity hardware. We discuss the design philosophy underlying Airblue that makes it relatively easy to modify it, and present early experimental results.National Science Foundation (U.S.) (NSF grant CNS-0721702)National Science Foundation (U.S.) (NSF grant CCF-0541164)National Science Foundation (U.S.) (NSF grant CCF-0811696

CiteSeerX

DSpace@MIT

Implementing a fast cartesian-polar matrix interpolator

Author: Agarwal Abhinav
Dave Nirav H.
Fleming Kermin Elliott
Khan Asif Imtiaz
King Myron Decker
Ng Man Cheuk
Vijayaraghavan Muralidaran
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2009
Field of study

The 2009 MEMOCODE Hardware/Software Co-Design Contest assignment was the implementation of a cartesian-to-polar matrix interpolator. We discuss our hardware and software design submissions

DSpace@MIT